SQLs can get complex when you want to join tables with non-unique join keys. Consider a financials system with tables account, payment, and receipt
. An account can have many payments and many receipts. Any query that tries to select payments and receipts together will obtain a cross product of the payments and receitps for each account.
For example:
SELECT a.account_no, sum(payment_amt), sum(receipt_amt) FROM account a, payment p, receipt r WHERE a.account_no = p.account_no AND a.account_no = r.account_no GROUP BY a.account_no;
This SQL is not going to work. For a given account with 5 payments and 3 receipts, the join would produce 5x3=15 rows before the GROUP BY aggregates them to one. The payment total will be 3 times higher than it should be, and the receipt total will be 5 times higher.
One way to resolve this using Oracle prior to v7.3 is as follows
SELECT a.account_no , sum(payment_amt) / count(distinct r.rowid) , sum(receipt_amt) / count(distinct p.rowid) FROM account a, payment p, receipt r WHERE a.account_no = p.account_no AND a.account_no = r.account_no GROUP BY a.account_no;
Although this seems very clever, it is inefficient and difficult to read - particularly if extended to three tables. A better solution is:
SELECT a.account_no, p.payment_total, r.receipt_total FROM account a ,( SELECT account_no, sum(receipt_amt) as receipt_total FROM receipt GROUP BY account_no) r ,( SELECT account_no, sum(payment_amt) as payment_total FROM payment GROUP BY account_no) p WHERE a.account_no = r.account_no AND a.account_no = p.account_no GROUP BY a.account_no;
What happens if it is not this simple? Above, the non-unique joins have been made unique using aggregation. What if that is not possible?
Consider a system with tables job
, job_status_hist
, and invoice
. job_status_hist
records the changes in status of a job
over time - eg. (O)pen, (C)losed, (R)eopened. A job may have many invoices which are sent out periodically over the duration of a job. A report is required that shows the initial cost of a job (invoices sent between the original Open status record, and the first Closed status record), and the total cost of a job (total of all invoices).
This is quite possible to produce in SQL, but would probably be large and convoluted and difficult to read. Furthermore, minor changes in the business rule (eg. special coding to handle jobs closed in error and immediately reopened) would be difficult to handle. The logical solution is to code the report procedurally. An algorithm might be:
job
job_status_hist
invoice
Select the total cost from invoice
Simple to code and simple to read. But very inefficient if the number of jobs is large. The SQLs or Cursors required to get the dates and costs inside the loop must be executed separately for every job.
A better way to do this is Concurrent Cursors. Instead of running independent SQLs inside the loop for every job, we open three cursors at the beginning - one from each table - and process them in job number order.
For example:
job
cursor, order by job_nojob_status_hist
cursor, order by job_no and status_dateinvoice
cursor, order by job_no and invoice_dateThis technique has allowed us to process high volumes of data transactionally without resorting to indexes. It is only marginally less efficient than a single SQL performing Hash or Sort-Merge joins (not that it would be possible in this example).